Method and device for performance-based progression of virtual content

ABSTRACT

In some implementations, a method includes: obtaining first input data while presenting a computer-generated reality (CGR) environment from the perspective of a first character associated with a first time slice within predetermined content; determining whether or not the first input data satisfies first performance criteria associated with the first character for the first time slice within the predetermined content; and, in response to determining that the first input data satisfies the first performance criteria associated with the first character for the first time slice within the predetermined content, updating the CGR environment from the perspective of the first character associated with a second time slice within the predetermined content.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent App. No.62/861,893, filed on Jun. 14, 2019, which is incorporated by referencein its entirety.

TECHNICAL FIELD

The present disclosure generally relates to virtual content (sometimesalso referred to herein as “computer-generated reality (CGR) content”),and in particular, to systems, methods, and devices forperformance-based progression of virtual content.

BACKGROUND

Virtual reality (VR) and augmented reality (AR) are becoming morepopular due to their remarkable ability to alter a user's perception ofthe world. For example, VR and AR are used for learning purposes, gamingpurposes, content creation purposes, social media and interactionpurposes, or the like. These technologies differ in the user'sperception of his/her presence. VR transposes the user into a virtualspace, so their VR perception is different from his/her real-worldperception. In contrast, AR takes the user's real-world perception andadds something to it.

These technologies are becoming more commonplace due to, for example,miniaturization of hardware components, improvements to hardwareperformance, and improvements to software efficiency. As one example, auser may experience AR content superimposed on a live video feed of theuser's environment on a handheld display (e.g., an AR-enabled mobilephone or tablet with video pass-through). As another example, a user mayexperience AR content by wearing a near-eye system or head-mountableenclosure that still allows the user to see his/her surroundings (e.g.,glasses with optical see-through). As yet another example, a user mayexperience VR content by using a near-eye system that encloses theuser's field-of-view and is tethered to a computer.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the present disclosure can be understood by those of ordinaryskill in the art, a more detailed description may be had by reference toaspects of some illustrative implementations, some of which are shown inthe accompanying drawings.

FIG. 1 is a block diagram of an example operating architecture inaccordance with some implementations.

FIG. 2 is a block diagram of an example controller in accordance withsome implementations.

FIG. 3 is a block diagram of an example electronic device in accordancewith some implementations.

FIG. 4 is a block diagram of an example data processing architecture inaccordance with some implementations.

FIG. 5A illustrates an overview timeline associated with predeterminedcontent in accordance with some implementations.

FIG. 5B illustrates a sub-timeline associated with a thematic scene withthe predetermined content in accordance with some implementations.

FIG. 5C illustrates example template characterization vectors inaccordance with some implementations.

FIG. 5D illustrates an example input characterization vector inaccordance with some implementations.

FIG. 5E illustrates example abstract representations associated withportions of the input characterization vector in FIG. 5D in accordancewith some implementations.

FIG. 6A illustrates the example abstract representation of the body poseportion of the input characterization vector in FIG. 5D relative to anacceptability threshold in accordance with some implementations.

FIG. 6B illustrates another example abstract representation of a bodypose portion of an input characterization vector relative to anacceptability threshold in accordance with some implementations.

FIG. 7 is a flowchart representation of a method of performance-basedprogression of CGR content in accordance with some implementations.

FIGS. 8A and 8B illustrate a flowchart representation of a method ofperformance-based progression of CGR content in accordance with someimplementations.

In accordance with common practice the various features illustrated inthe drawings may not be drawn to scale. Accordingly, the dimensions ofthe various features may be arbitrarily expanded or reduced for clarity.In addition, some of the drawings may not depict all of the componentsof a given system, method or device. Finally, like reference numeralsmay be used to denote like features throughout the specification andfigures.

SUMMARY

Various implementations disclosed herein include devices, systems, andmethods for performance-based progression of computer-generated reality(CGR) content. According to some implementations, the method isperformed at a device including non-transitory memory and one or moreprocessors coupled with the non-transitory memory. The method includes:obtaining first input data while presenting a CGR environment from theperspective of a first character associated with a first time slicewithin predetermined content; determining whether or not the first inputdata satisfies first performance criteria associated with the firstcharacter for the first time slice within the predetermined content;and, in response to determining that the first input data satisfies thefirst performance criteria associated with the first character for thefirst time slice within predetermined content, updating the CGRenvironment from the perspective of the first character associated witha second time slice within the predetermined content.

In accordance with some implementations, a device includes one or moreprocessors, a non-transitory memory, and one or more programs; the oneor more programs are stored in the non-transitory memory and configuredto be executed by the one or more processors and the one or moreprograms include instructions for performing or causing performance ofany of the methods described herein. In accordance with someimplementations, a non-transitory computer readable storage medium hasstored therein instructions, which, when executed by one or moreprocessors of a device, cause the device to perform or cause performanceof any of the methods described herein. In accordance with someimplementations, a device includes: one or more processors, anon-transitory memory, and means for performing or causing performanceof any of the methods described herein.

DESCRIPTION

Numerous details are described in order to provide a thoroughunderstanding of the example implementations shown in the drawings.However, the drawings merely show some example aspects of the presentdisclosure and are therefore not to be considered limiting. Those ofordinary skill in the art will appreciate that other effective aspectsand/or variants do not include all of the specific details describedherein. Moreover, well-known systems, methods, components, devices andcircuits have not been described in exhaustive detail so as not toobscure more pertinent aspects of the example implementations describedherein.

A physical environment refers to a physical world that people can senseand/or interact with without aid of electronic systems. Physicalenvironments, such as a physical park, include physical articles, suchas physical trees, physical buildings, and physical people. People candirectly sense and/or interact with the physical environment, such asthrough sight, touch, hearing, taste, and smell.

In contrast, a computer-generated reality (CGR) environment refers to awholly or partially simulated environment that people sense and/orinteract with via an electronic system. In CGR, a subset of a person'sphysical motions, or representations thereof, are tracked, and, inresponse, one or more characteristics of one or more CGR objectssimulated in the CGR environment are adjusted in a manner that comportswith at least one law of physics. For example, a CGR system may detect aperson's head turning and, in response, adjust graphical content and anacoustic field presented to the person in a manner similar to how suchviews and sounds would change in a physical environment. In somesituations (e.g., for accessibility reasons), adjustments tocharacteristic(s) of CGR object(s) in a CGR environment may be made inresponse to representations of physical motions (e.g., vocal commands).

A person may sense and/or interact with a CGR object using any one oftheir senses, including sight, sound, touch, taste, and smell. Forexample, a person may sense and/or interact with audio objects thatcreate 3D or spatial audio environment that provides the perception ofpoint audio sources in 3D space. In another example, audio objects mayenable audio transparency, which selectively incorporates ambient soundsfrom the physical environment with or without computer-generated audio.In some CGR environments, a person may sense and/or interact only withaudio objects.

A virtual reality (VR) environment refers to a simulated environmentthat is designed to be based entirely on computer-generated sensoryinputs for one or more senses. A VR environment comprises a plurality ofvirtual objects with which a person may sense and/or interact. Forexample, computer-generated imagery of trees, buildings, and avatarsrepresenting people are examples of virtual objects. A person may senseand/or interact with virtual objects in the VR environment through asimulation of the person's presence within the computer-generatedenvironment, and/or through a simulation of a subset of the person'sphysical movements within the computer-generated environment.

In contrast to a VR environment, which is designed to be based entirelyon computer-generated sensory inputs, a mixed reality (MR) environmentrefers to a simulated environment that is designed to incorporatesensory inputs from the physical environment, or a representationthereof, in addition to including computer-generated sensory inputs(e.g., virtual objects). On a virtuality continuum, a mixed realityenvironment is anywhere between, but not including, a wholly physicalenvironment at one end and virtual reality environment at the other end.

In some MR environments, computer-generated sensory inputs may respondto changes in sensory inputs from the physical environment. Also, someelectronic systems for presenting an MR environment may track locationand/or orientation with respect to the physical environment to enablevirtual objects to interact with real-world objects (that is, physicalarticles from the physical environment or representations thereof). Forexample, a system may account for movements so that a virtual treeappears stationery with respect to the physical ground.

An augmented reality (AR) environment refers to a simulated environmentin which one or more virtual objects are superimposed over a physicalenvironment, or a representation thereof. For example, an electronicsystem for presenting an AR environment may have a transparent ortranslucent display through which a person may directly view thephysical environment. The system may be configured to present virtualobjects on the transparent or translucent display, so that a person,using the system, perceives the virtual objects superimposed over thephysical environment. Alternatively, a system may have an opaque displayand one or more imaging sensors that capture images or video of thephysical environment, which are representations of the physicalenvironment. The system composites the images or video with virtualobjects and presents the composition on the opaque display. A person,using the system, indirectly views the physical environment by way ofthe images or video of the physical environment, and perceives thevirtual objects superimposed over the physical environment. As usedherein, a video of the physical environment shown on an opaque displayis called “pass-through video,” meaning a system uses one or more imagesensor(s) to capture images of the physical environment and uses thoseimages in presenting the AR environment on the opaque display. Furtheralternatively, a system may have a projection system that projectsvirtual objects into the physical environment, for example, as ahologram or on a physical surface, so that a person, using the system,perceives the virtual objects superimposed over the physicalenvironment.

An augmented reality environment also refers to a simulated environmentin which a representation of a physical environment is transformed bycomputer-generated sensory information. For example, in providingpass-through video, a system may transform one or more sensor images toimpose a select perspective (e.g., viewpoint) different than theperspective captured by the imaging sensors. As another example, arepresentation of a physical environment may be transformed bygraphically modifying (e.g., enlarging) portions thereof, such that themodified portion may be representative but not photorealistic versionsof the originally captured images. As a further example, arepresentation of a physical environment may be transformed bygraphically eliminating or obfuscating portions thereof.

An augmented virtuality (AV) environment refers to a simulatedenvironment in which a virtual or computer-generated environmentincorporates one or more sensory inputs from the physical environment.The sensory inputs may be representations of one or more characteristicsof the physical environment. For example, an AV park may have virtualtrees and virtual buildings, but people with faces photorealisticallyreproduced from images taken of physical people. As another example, avirtual object may adopt a shape or color of a physical article imagedby one or more imaging sensors. As a further example, a virtual objectmay adopt shadows consistent with the position of the sun in thephysical environment.

There are many different types of electronic systems that enable aperson to sense and/or interact with various CGR environments. Examplesinclude near-eye systems, projection-based systems, heads-up displays(HUDs), vehicle windshields having integrated display capability,windows having integrated display capability, displays formed as lensesdesigned to be placed on a person's eyes (e.g., similar to contactlenses), headphones/earphones, speaker arrays, input systems (e.g.,wearable or handheld controllers with or without haptic feedback),smartphones, tablets, and desktop/laptop computers. A near-eye systemmay have one or more speaker(s) and an integrated opaque display.Alternatively, a near-eye system may be configured to accept an externalopaque display (e.g., a smartphone). The near-eye system may incorporateone or more imaging sensors to capture images or video of the physicalenvironment, and/or one or more microphones to capture audio of thephysical environment. Rather than an opaque display, a near-eye systemmay have a transparent or translucent display. The display may utilizedigital light projection, micro-electromechanical systems (MEMS),digital micromirror devices (DMDs), organic light-emitting diodes(OLEDs), light-emitting diodes (LEDs), micro-light-emitting diodes(μLEDs), liquid crystal on silicon (LCoS), laser scanning light source,or any combination of these technologies. The medium may be an opticalwaveguide, a hologram medium, an optical combiner, an optical reflector,or any combination thereof. In one implementation, the transparent ortranslucent display may be configured to become opaque selectively.Projection-based systems may employ retinal projection technology thatprojects graphical images onto a person's retina. Projection systemsalso may be configured to project virtual objects into the physicalenvironment, for example, as a hologram or on a physical surface.

FIG. 1 is a block diagram of an example operating architecture 100 inaccordance with some implementations. While pertinent features areshown, those of ordinary skill in the art will appreciate from thepresent disclosure that various other features have not been illustratedfor the sake of brevity and so as not to obscure more pertinent aspectsof the example implementations disclosed herein. To that end, as anon-limiting example, the operating architecture 100 includes anoptional controller 110 and an electronic device 120 (e.g., a tablet,mobile phone, laptop, wearable computing device, or the like).

In some implementations, the controller 110 is configured to manage andcoordinate a CGR experience (sometimes also referred to herein as a “CGRenvironment”) for a user 150 and zero or more other users. In someimplementations, the controller 110 includes a suitable combination ofsoftware, firmware, and/or hardware. The controller 110 is described ingreater detail below with respect to FIG. 2. In some implementations,the controller 110 is a computing device that is local or remoterelative to the physical environment 105. For example, the controller110 is a local server located within the physical environment 105. Inanother example, the controller 110 is a remote server located outsideof the physical environment 105 (e.g., a cloud server, central server,etc.). In some implementations, the controller 110 is communicativelycoupled with the electronic device 120 via one or more wired or wirelesscommunication channels 144 (e.g., BLUETOOTH, IEEE 802.11x, IEEE 802.16x,IEEE 802.3x, etc.). In some implementations, the functions of thecontroller 110 are provided by the electronic device 120. As such, insome implementations, the components of the controller 110 areintegrated into the electronic device 120.

In some implementations, the electronic device 120 is configured topresent audio and/or video content to the user 150. In someimplementations, the electronic device 120 is configured to present theCGR experience to the user 150. In some implementations, the electronicdevice 120 includes a suitable combination of software, firmware, and/orhardware. The electronic device 120 is described in greater detail belowwith respect to FIG. 3.

According to some implementations, the electronic device 120 presents acomputer-generated reality (CGR) experience to the user 150 while theuser 150 is physically present within a physical environment 105 thatincludes a table 107 within the field-of-view 111 of the electronicdevice 120. As such, in some implementations, the user 150 holds theelectronic device 120 in his/her hand(s). In some implementations, whilepresenting the CGR experience, the electronic device 120 is configuredto present CGR content (e.g., a CGR cylinder 109) and to enable videopass-through of the physical environment 105 (e.g., including the table107) on a display 122. For example, the electronic device 120corresponds to a mobile phone, tablet, laptop, wearable computingdevice, or the like.

In some implementations, the display 122 corresponds to an additivedisplay that enables optical see-through of the physical environment 105including the table 107. For example, the display 122 correspond to atransparent lens, and the electronic device 120 corresponds to a pair ofglasses worn by the user 150. As such, in some implementations, theelectronic device 120 presents a user interface (e.g., the CGRenvironment 128) by projecting the CGR content (e.g., the CGR cylinder109) onto the additive display, which is, in turn, overlaid on thephysical environment 105 from the perspective of the user 150. In someimplementations, the electronic device 120 presents the user interfaceby displaying the CGR content (e.g., the CGR cylinder 109) on theadditive display, which is, in turn, overlaid on the physicalenvironment 105 from the perspective of the user 150.

In some implementations, the user 150 wears the electronic device 120such as a near-eye system. As such, the electronic device 120 includesone or more displays provided to display the CGR content (e.g., a singledisplay or one for each eye). For example, the electronic device 120encloses the field-of-view of the user 150. In such implementations, theelectronic device 120 presents the CGR environment 128 by displayingdata corresponding to the CGR environment 128 on the one or moredisplays or by projecting data corresponding to the CGR environment 128onto the retinas of the user 150.

In some implementations, the electronic device 120 includes anintegrated display (e.g., a built-in display) that displays the CGRenvironment 128. In some implementations, the electronic device 120includes a head-mountable enclosure. In various implementations, thehead-mountable enclosure includes an attachment region to which anotherdevice with a display can be attached. For example, in someimplementations, the electronic device 120 can be attached to thehead-mountable enclosure. In various implementations, the head-mountableenclosure is shaped to form a receptacle for receiving another devicethat includes a display (e.g., the electronic device 120). For example,in some implementations, the electronic device 120 slides/snaps into orotherwise attaches to the head-mountable enclosure. In someimplementations, the display of the device attached to thehead-mountable enclosure presents (e.g., displays) the CGR environment128. In some implementations, the electronic device 120 is replaced witha CGR chamber, enclosure, or room configured to present CGR content inwhich the user 150 does not wear the electronic device 120.

In some implementations, the controller 110 and/or the electronic device120 cause a CGR representation of the user 150 to move within the CGRenvironment 128 based on movement information (e.g., body pose data, eyetracking data, hand tracking data, etc.) from the electronic device 120and/or optional remote input devices within the physical environment105. In some implementations, the optional remote input devicescorrespond to fixed or movable sensory equipment within the physicalenvironment 105 (e.g., image sensors, depth sensors, infrared (IR)sensors, event cameras, microphones, etc.). In some implementations,each of the remote input devices is configured to collect/capture inputdata and provide the input data to the controller 110 and/or theelectronic device 120 while the user 150 is physically within thephysical environment 105. In some implementations, the remote inputdevices include microphones, and the input data includes audio dataassociated with the user 150 (e.g., speech samples). In someimplementations, the remote input devices include image sensors (e.g.,cameras), and the input data includes images of the user 150. In someimplementations, the input data characterizes body poses of the user 150at different times. In some implementations, the input datacharacterizes head poses of the user 150 at different times. In someimplementations, the input data characterizes hand tracking informationassociated with the hands of the user 150 at different times. In someimplementations, the input data characterizes the velocity and/oracceleration of body parts of the user 150 such as his/her hands. Insome implementations, the input data indicates joint positions and/orjoint orientations of the user 150. In some implementations, the remoteinput devices include feedback devices such as speakers, lights, or thelike.

FIG. 2 is a block diagram of an example of the controller 110 inaccordance with some implementations. While certain specific featuresare illustrated, those skilled in the art will appreciate from thepresent disclosure that various other features have not been illustratedfor the sake of brevity, and so as not to obscure more pertinent aspectsof the implementations disclosed herein. To that end, as a non-limitingexample, in some implementations, the controller 110 includes one ormore processing units 202 (e.g., microprocessors, application-specificintegrated-circuits (ASICs), field-programmable gate arrays (FPGAs),graphics processing units (GPUs), central processing units (CPUs),processing cores, and/or the like), one or more input/output (I/O)devices 206, one or more communication interfaces 208 (e.g., universalserial bus (USB), IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, global systemfor mobile communications (GSM), code division multiple access (CDMA),time division multiple access (TDMA), global positioning system (GPS),infrared (IR), BLUETOOTH, ZIGBEE, and/or the like type interface), oneor more programming (e.g., I/O) interfaces 210, a memory 220, and one ormore communication buses 204 for interconnecting these and various othercomponents.

In some implementations, the one or more communication buses 204 includecircuitry that interconnects and controls communications between systemcomponents. In some implementations, the one or more I/O devices 206include at least one of a keyboard, a mouse, a touchpad, a joystick, oneor more microphones, one or more speakers, one or more image sensors,one or more displays, and/or the like.

The memory 220 includes high-speed random-access memory, such as dynamicrandom-access memory (DRAM), static random-access memory (CGRAM),double-data-rate random-access memory (DDR RAM), or other random-accesssolid-state memory devices. In some implementations, the memory 220includes non-volatile memory, such as one or more magnetic disk storagedevices, optical disk storage devices, flash memory devices, or othernon-volatile solid-state storage devices. The memory 220 optionallyincludes one or more storage devices remotely located from the one ormore processing units 202. The memory 220 comprises a non-transitorycomputer readable storage medium. In some implementations, the memory220 or the non-transitory computer readable storage medium of the memory220 stores the following programs, modules and data structures, or asubset thereof including an optional operating system 230, a dataobtainer 242, an input characterization engine 250, a computer-generatedreality (CGR) experience engine 260, and a data transmitter 244.

The operating system 230 includes procedures for handling various basicsystem services and for performing hardware dependent tasks.

In some implementations, the data obtainer 242 is configured to obtaindata (e.g., presentation data, user interaction data, sensor data,location data, etc.) from at least one of the I/O devices 206 of thecontroller 110, the electronic device 120, and the optional remote inputdevices. To that end, in various implementations, the data obtainer 242includes instructions and/or logic therefor, and heuristics and metadatatherefor.

In some implementations, the input characterization engine 250 isconfigured to generate an input characterization vector for a respectivetime slice based on input data (e.g., audio data, body pose data, andeye tracking data, which are sometimes collectively referred to hereinas “sensor data”) obtained from sensors and/or input devices of thecontroller 110, the electronic device 120, and/or the optional remoteinput devices. To that end, in various implementations, the inputcharacterization engine 250 includes a natural language processor (NLP)252, a speech assessor 254, a body pose interpreter 256, and a gazedirection determiner 258. FIG. 5D, described in more detail below,illustrates an input characterization vector 550 associated with arespective time slice-character tuple (e.g., time slice 515A-character532A)

In some implementations, the input characterization vector includes adialogue portion that corresponds to the output from the NLP 252. Insome implementations, the input characterization vector includes adialogue delivery portion that corresponds to the output from the speechassessor 254. In some implementations, the input characterization vectorincludes a body pose portion that corresponds to the output from thebody pose interpreter 256. In some implementations, the inputcharacterization vector includes an eye tracking portion thatcorresponds to the output from the gaze direction determiner 258.

In some implementations, the NLP 252 is configured to perform naturallanguage processing (or another speech recognition technique) on audiodata in order to generate the dialogue portion of the inputcharacterization vector. To that end, in various implementations, theNLP 252 includes instructions and/or logic therefor, and heuristics andmetadata therefor.

In some implementations, the speech assessor 254 is configured todetermine one or more speech characteristics associated with the audiodata (e.g., intonation, cadence, accent, diction, articulation,pronunciation, and/or the like) in order to generate the dialoguedelivery portion of the input characterization vector. To that end, invarious implementations, the speech assessor 254 includes instructionsand/or logic therefor, and heuristics and metadata therefor.

In some implementations, the body pose interpreter 256 is configured todetermine one or more pose characteristics associated with the body posedata in order to generate the body pose portion of the inputcharacterization vector. To that end, in various implementations, thebody pose interpreter 256 includes instructions and/or logic therefor,and heuristics and metadata therefor.

In some implementations, the gaze direction determiner 258 is configuredto determine a directionality vector associated with the eye trackingdata (e.g., X, Y, and/or focal point coordinates) in order to generatethe eye tracking portion of the input characterization vector. To thatend, in various implementations, the gaze direction determiner 258includes instructions and/or logic therefor, and heuristics and metadatatherefor.

In some implementations, the CGR experience engine 260 is configured tomanage and coordinate one or more CGR experiences for one or more users(e.g., a single CGR experience for one or more users, or multiple CGRexperiences for respective groups of one or more users). To that end, invarious implementations, the CGR experience engine 260 includes a mapperand locator engine 262, a template selector 264, a performanceassessment engine 268, and a CGR content manager 270.

In some implementations, the mapper and locator engine 262 is configuredto map the physical environment 105 and to track the position/locationof at least the electronic device 120 with respect to the physicalenvironment 105. To that end, in various implementations, the mapper andlocator engine 262 includes instructions and/or logic therefor, andheuristics and metadata therefor.

In some implementations, the template selector 264 is configured toselect a template characterization vector from a template library 266based on a current time slice-character tuple. To that end, in variousimplementations, the template selector 264 includes instructions and/orlogic therefor, and heuristics and metadata therefor.

In some implementations, the template library 266 includes a firstplurality of template characterization vectors for first predeterminedcontent (e.g., CGR content A, movie B, or the like) indexed based ontime slice-character tuples associated with the first predeterminedcontent. In some implementations, the template library 266 furtherincludes a second plurality of template characterization vectors forsecond predetermined content (e.g., CGR content B, movie B, or the like)indexed based on time slice-character tuples associated with the firstpredetermined content. FIG. 5C, described in more detail below,illustrates a first template characterization vector 520A associatedwith a first time slice-character tuple (e.g., time slice 515A-character532A) and a second template characterization vector 520B associated witha second time slice-character tuple (e.g., time slice 515A-character532B). As shown in FIG. 5C, both the first template characterizationvector 520A and second template characterization vector 520B areassociated with a same time slice 515A of predetermined content 530, butthe first template characterization vector 520A and second templatecharacterization vector 520B are associated with differentcharacters—the character 532A and the character 532B, respectively.

In some implementations, the performance assessment engine 268 isconfigured to determine whether the input data satisfies performancecriteria associated with a respective time slice-character tuple basedon a comparison between one or more portions of the inputcharacterization vector for the respective time slice-character tupleand one or more portions of the selected template characterizationvector associated with the respective time slice-character tuple. Tothat end, in various implementations, the performance assessment engine268 includes instructions and/or logic therefor, and heuristics andmetadata therefor.

In some implementations, the CGR content manager 270 is configured toupdate the CGR environment based on a next time slice when the inputdata satisfies the performance criteria associated with the current timeslice-character tuple in order to progress to the next time slice. Insome implementations, the CGR content manager 270 is configured tomaintain the CGR environment based on the current time slice when theinput data does not satisfy the performance criteria associated with thecurrent time slice-character tuple in order to prepare for a reattemptof the current time slice. To that end, in various implementations, theCGR content manager 270 includes instructions and/or logic therefor, andheuristics and metadata therefor.

In some implementations, the data transmitter 244 is configured totransmit data (e.g., presentation data, location data, etc.) to at leastthe electronic device 120. To that end, in various implementations, thedata transmitter 244 includes instructions and/or logic therefor, andheuristics and metadata therefor.

Although the data obtainer 242, the input characterization engine 250,the CGR experience engine 260, and the data transmitter 244 are shown asresiding on a single device (e.g., the controller 110), it should beunderstood that in other implementations, any combination of the dataobtainer 242, the input characterization engine 250, the CGR experienceengine 260, and the data transmitter 244 may be located in separatecomputing devices.

In some implementations, the functions and/or components of thecontroller 110 are combined with or provided by the electronic device120 shown below in FIG. 3. Moreover, FIG. 2 is intended more as afunctional description of the various features which be present in aparticular implementation as opposed to a structural schematic of theimplementations described herein. As recognized by those of ordinaryskill in the art, items shown separately could be combined and someitems could be separated. For example, some functional modules shownseparately in FIG. 2 could be implemented in a single module and thevarious functions of single functional blocks could be implemented byone or more functional blocks in various implementations. The actualnumber of modules and the division of particular functions and howfeatures are allocated among them will vary from one implementation toanother and, in some implementations, depends in part on the particularcombination of hardware, software, and/or firmware chosen for aparticular implementation.

FIG. 3 is a block diagram of an example of the electronic device 120 inaccordance with some implementations. While certain specific featuresare illustrated, those skilled in the art will appreciate from thepresent disclosure that various other features have not been illustratedfor the sake of brevity, and so as not to obscure more pertinent aspectsof the implementations disclosed herein. To that end, as a non-limitingexample, in some implementations, the electronic device 120 includes oneor more processing units 302 (e.g., microprocessors, ASICs, FPGAs, GPUs,CPUs, processing cores, and/or the like), one or more input/output (I/O)devices and sensors 306, one or more communication interfaces 308 (e.g.,USB, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR,BLUETOOTH, ZIGBEE, and/or the like type interface), one or moreprogramming (e.g., I/O) interfaces 310, one or more displays 312, one ormore optional interior- and/or exterior-facing image sensors 314, amemory 320, and one or more communication buses 304 for interconnectingthese and various other components.

In some implementations, the one or more communication buses 304 includecircuitry that interconnects and controls communications between systemcomponents. In some implementations, the one or more I/O devices andsensors 306 include at least one of an inertial measurement unit (IMU),an accelerometer, a gyroscope, a thermometer, one or more physiologicalsensors (e.g., blood pressure monitor, heart rate monitor, blood oxygensensor, blood glucose sensor, etc.), one or more microphones, one ormore speakers, a haptics engine, a heating and/or cooling unit, a skinshear engine, one or more depth sensors (e.g., structured light,time-of-flight, or the like), and/or the like.

In some implementations, the one or more displays 312 are configured topresent the CGR experience to the user. In some implementations, the oneor more displays 312 are also configured to present flat video contentto the user (e.g., a 2-dimensional or “flat” AVI, FLV, WMV, MOV, MP4, orthe like file associated with a TV episode or a movie, or live videopass-through of the physical environment 105). In some implementations,the one or more displays 312 correspond to holographic, digital lightprocessing (DLP), liquid-crystal display (LCD), liquid-crystal onsilicon (LCoS), organic light-emitting field-effect transitory (OLET),organic light-emitting diode (OLED), surface-conduction electron-emitterdisplay (SED), field-emission display (FED), quantum-dot light-emittingdiode (QD-LED), micro-electro-mechanical system (MEMS), and/or the likedisplay types. In some implementations, the one or more displays 312correspond to diffractive, reflective, polarized, holographic, etc.waveguide displays. For example, the electronic device 120 includes asingle display. In another example, the electronic device 120 includes adisplay for each eye of the user. In some implementations, the one ormore displays 312 are capable of presenting AR and VR content. In someimplementations, the one or more displays 312 are capable of presentingAR or VR content.

In some implementations, the one or more optional interior- and/orexterior-facing image sensors 314 correspond to one or more RGB cameras(e.g., with a complementary metal-oxide-semiconductor (CMOS) imagesensor or a charge-coupled device (CCD) image sensor), IR image sensors,event-based cameras, and/or the like.

The memory 320 includes high-speed random-access memory, such as DRAM,CGRAM, DDR RAM, or other random-access solid-state memory devices. Insome implementations, the memory 320 includes non-volatile memory, suchas one or more magnetic disk storage devices, optical disk storagedevices, flash memory devices, or other non-volatile solid-state storagedevices. The memory 320 optionally includes one or more storage devicesremotely located from the one or more processing units 302. The memory320 comprises a non-transitory computer readable storage medium. In someimplementations, the memory 320 or the non-transitory computer readablestorage medium of the memory 320 stores the following programs, modulesand data structures, or a subset thereof including an optional operatingsystem 330 and a CGR presentation engine 340.

The operating system 330 includes procedures for handling various basicsystem services and for performing hardware dependent tasks. In someimplementations, the CGR presentation engine 340 is configured topresent CGR content to the user via the one or more displays 312. Tothat end, in various implementations, the CGR presentation engine 340includes a data obtainer 342, a CGR presenter 344, an interactionhandler 346, and a data transmitter 350.

In some implementations, the data obtainer 342 is configured to obtaindata (e.g., presentation data, user interaction data, sensor data,location data, etc.) from at least one of the I/O devices and sensors306 of the electronic device 120, the controller 110, and the remoteinput devices. To that end, in various implementations, the dataobtainer 342 includes instructions and/or logic therefor, and heuristicsand metadata therefor.

In some implementations, the CGR presenter 344 is configured to presentand update CGR content via the one or more displays 312. To that end, invarious implementations, the CGR presenter 344 includes instructionsand/or logic therefor, and heuristics and metadata therefor.

In some implementations, the interaction handler 346 is configured todetect and interpret user interactions with the presented CGR content.To that end, in various implementations, the interaction handler 346includes instructions and/or logic therefor, and heuristics and metadatatherefor.

In some implementations, the data transmitter 350 is configured totransmit data (e.g., presentation data, location data, user interactiondata, etc.) to at least the controller 110. To that end, in variousimplementations, the data transmitter 350 includes instructions and/orlogic therefor, and heuristics and metadata therefor.

Although the data obtainer 342, the CGR presenter 344, the interactionhandler 346, and the data transmitter 350 are shown as residing on asingle device (e.g., the electronic device 120), it should be understoodthat in other implementations, any combination of the data obtainer 342,the CGR presenter 344, the interaction handler 346, and the datatransmitter 350 may be located in separate computing devices.

Moreover, FIG. 3 is intended more as a functional description of thevarious features which be present in a particular implementation asopposed to a structural schematic of the implementations describedherein. As recognized by those of ordinary skill in the art, items shownseparately could be combined and some items could be separated. Forexample, some functional modules shown separately in FIG. 3 could beimplemented in a single module and the various functions of singlefunctional blocks could be implemented by one or more functional blocksin various implementations. The actual number of modules and thedivision of particular functions and how features are allocated amongthem will vary from one implementation to another and, in someimplementations, depends in part on the particular combination ofhardware, software, and/or firmware chosen for a particularimplementation.

FIG. 4 illustrates an example data processing architecture 400 inaccordance with some implementations. While pertinent features areshown, those of ordinary skill in the art will appreciate from thepresent disclosure that various other features have not been illustratedfor the sake of brevity and so as not to obscure more pertinent aspectsof the example implementations disclosed herein. In someimplementations, the data processing architecture 400 is included in thecontroller 110 shown in FIGS. 1 and 2; the electronic device 120 shownin FIGS. 1 and 3; and/or a suitable combination thereof.

As shown in FIG. 4, the data processing architecture 400 obtains inputdata (e.g., sensor data) associated with a plurality of modalities,including audio data 402A, body pose data 402B, and eye tracking data402C. For example, the audio data 402A corresponds to audio signalscaptured by one or more microphones of the controller 110, theelectronic device 120, and/or the optional remote input devices. Forexample, the body pose data 402B corresponds to images captured by oneor more images sensors of the controller 110, the electronic device 120,and/or the optional remote input devices. For example, the eye trackingdata 402C corresponds to images captured by one or more images sensorsof the controller 110, the electronic device 120, and/or the optionalremote input devices.

According to some implementations, the audio data 402A corresponds to anongoing or continuous time series of values. In turn, the times seriesconverter 410 is configured to generate one or more temporal frames ofaudio data from a continuous stream of audio data. Each temporal frameof audio data includes a temporal portion of the audio data 402A. Insome implementations, the times series converter 410 includes awindowing module 410A that is configured to mark and separate one ormore temporal frames or portions of the audio data 402A for times T₁,T₂, . . . , T_(N).

In some implementations, each temporal frame of the audio data 402A isconditioned by a pre-filter (not shown). For example, in someimplementations, pre-filtering includes band-pass filtering to isolateand/or emphasize the portion of the frequency spectrum typicallyassociated with human speech. In some implementations, pre-filteringincludes pre-emphasizing portions of one or more temporal frames of theaudio data in order to adjust the spectral composition of the one ormore temporal frames of the audio data 402A. Additionally and/oralternatively, in some implementations, the windowing module 410A isconfigured to retrieve the audio data 402A from a non-transitory memory.Additionally and/or alternatively, in some implementations,pre-filtering includes filtering the audio data 402A using a low-noiseamplifier (LNA) in order to substantially set a noise floor for furtherprocessing. In some implementations, a pre-filtering LNA is arrangedprior to the time series converter 410. Those of ordinary skill in theart will appreciate that numerous other pre-filtering techniques may beapplied to the audio data, and those highlighted herein are merelyexamples of numerous pre-filtering options available.

According to some implementations, the body pose data 402B correspondsto an ongoing or continuous time series of images or values. In turn,the times series converter 410 is configured to generate one or moretemporal frames of body pose data from a continuous stream of body posedata. Each temporal frame of body pose data includes a temporal portionof the body pose data 402B. In some implementations, the times seriesconverter 410 includes a windowing module 410A that is configured tomark and separate one or more temporal frames or portions of the bodypose data 402B for times T₁, T₂, . . . , T_(N). In some implementations,each temporal frame of the body pose data 402B is conditioned by apre-filter or otherwise pre-processed (not shown).

According to some implementations, the eye tracking data 402Ccorresponds to an ongoing or continuous time series of images or values.In turn, the times series converter 410 is configured to generate one ormore temporal frames of eye tracking data from a continuous stream ofeye tracking data. Each temporal frame of eye tracking data includes atemporal portion of the eye tracking data 402C. In some implementations,the times series converter 410 includes a windowing module 410A that isconfigured to mark and separate one or more temporal frames or portionsof the eye tracking data 402C for times T₁, T₂, . . . , T_(N). In someimplementations, each temporal frame of the eye tracking data 402C isconditioned by a pre-filter or otherwise pre-processed (not shown).

In various implementations, the data processing architecture 400includes a privacy subsystem 420 that includes one or more privacysetting filters associated with user information and/or identifyinginformation (e.g., at least some portions of the audio data 402A, thebody pose data 402B, and/or the eye tracking data 402C). In someimplementations, the privacy subsystem 420 selectively prevents and/orlimits the data processing architecture 400 or portions thereof fromobtaining and/or transmitting the user information. To this end, theprivacy subsystem 420 receives user preferences and/or selections fromthe user in response to prompting the user for the same. In someimplementations, the privacy subsystem 420 prevents the data processingarchitecture 400 from obtaining and/or transmitting the user informationunless and until the privacy subsystem 420 obtains informed consent fromthe user. In some implementations, the privacy subsystem 420 anonymizes(e.g., scrambles or obscures) certain types of user information. Forexample, the privacy subsystem 420 receives user inputs designatingwhich types of user information the privacy subsystem 420 anonymizes. Asanother example, the privacy subsystem 420 anonymizes certain types ofuser information likely to include sensitive and/or identifyinginformation, independent of user designation (e.g., automatically).

In some implementations, the NLP 252 is configured to perform naturallanguage processing (or another speech recognition technique) on theaudio data 402A or one or more temporal frames thereof. For example, theNLP 252 includes a processing model (e.g., a hidden Markov model, adynamic time warping algorithm, or the like) or a machine learning node(e.g., a neural network, convolutional neural network, recurrent neuralnetwork, deep neural network, state vector machine, random forestalgorithm, or the like) that performs speech-to-text (STT) processing.

In some implementations, the speech assessor 254 is configured todetermine one or more speech characteristics associated with the audiodata 402A (or one or more temporal frames thereof). For example, the oneor more speech characteristics corresponds to intonation, cadence,accent, diction, articulation, pronunciation, and/or the like. Forexample, the speech assessor 254 performs speech segmentation on theaudio data 402A in order to break the audio data 402A into words,syllables, phonemes, and/or the like and, subsequently, determines oneor more speech characteristics therefor.

In some implementations, the body pose interpreter 256 is configured todetermine one or more pose characteristics associated with the body posedata 402B (or one or more temporal frames thereof). For example, thebody pose interpreter 256 determines an overall pose of the user (e.g.,sitting, standing, crouching, etc.) for each sampling period (e.g., eachimage within the body pose data 402B) or predefined set of samplingperiods (e.g., every N images within the body pose data 402B). Forexample, the body pose interpreter 256 determines rotational and/ortranslational coordinates for each joint, limb, and/or body portion ofthe user for each sampling period (e.g., each image within the body posedata 402B) or predefined set of sampling periods (e.g., every N imageswithin the body pose data 402B). For example, the body pose interpreter256 determines rotational and/or translational coordinates for specificbody parts (e.g., head, hands, and/or the like) for each sampling period(e.g., each image within the body pose data 402B) or predefined set ofsampling periods (e.g., every N images within the body pose data 402B).

In some implementations, the gaze direction determiner 258 is configuredto determine a directionality vector associated with the eye trackingdata 402C (or one or more temporal frames thereof). For example, thegaze direction determiner 258 determines a directionality vector (e.g.,X, Y, and/or focal point coordinates) for each sampling period (e.g.,each image within the eye tracking data 402C) or predefined set ofsampling periods (e.g., every N images within the eye tracking data402C).

In some implementations, an input characterization engine 250 isconfigured to generate an input characterization vector based on theoutputs from the NLP 252, the speech assessor 254, the body poseinterpreter 256, and the gaze direction determiner 258. In someimplementations, the input characterization vector includes a dialogueportion that corresponds to the output from the NLP 252. In someimplementations, the input characterization vector includes a dialoguedelivery portion that corresponds to the output from the speech assessor254. In some implementations, the input characterization vector includesa body pose portion that corresponds to the output from the body poseinterpreter 256. In some implementations, the input characterizationvector includes an eye tracking portion that corresponds to the outputfrom the gaze direction determiner 258.

In some implementations, an input characterization engine 250 is alsoconfigured correlate the input characterization vector with at least atime slice indicator associated with a current time slice withinpredetermined content and a character identifier associated with acurrent character within predetermined (sometimes also referred toherein as the time slice-character tuple). In some implementations, theinput characterization engine 250 is also configured correlate the inputcharacterization vector with a content indicator associated with thepredetermined content and a thematic scene indicator associated with acurrent thematic scene within the predetermined content. FIG. 5D,described in more detail below, illustrates an input characterizationvector 550 associated with a respective time slice-character tuple(e.g., time slice 515A-character 532A).

In some implementations, the template selector 264 is configured toselect a template characterization vector from the template library 266based on the time slice-character tuple associated with the inputcharacterization vector. In some implementations, the template library266 includes a first plurality of template characterization vectors forfirst predetermined content (e.g., CGR content A, movie B, or the like)indexed based on time slice-character tuples associated with the firstpredetermined content. FIG. 5C, described in more detail below,illustrates a first template characterization vector 520A associatedwith a first time slice-character tuple (e.g., time slice 515A-character532A) and a second template characterization vector 520B associated witha second time slice-character tuple (e.g., time slice 515A-character532B).

In some implementations, the performance assessment engine 268 isconfigured to determine whether the input data (e.g., the audio data402A, the body pose data 402B, and the gaze tracking data 402C)satisfies performance criteria associated with a respective timeslice-character tuple based on a comparison between one or more portionsof the input characterization vector for the respective timeslice-character tuple and one or more portions of the selected templatecharacterization vector associated with the respective timeslice-character tuple. In some implementations, the performanceassessment engine 268 obtains a most recent input characterizationvector from a data buffer 430 that stores the input characterizationvectors in a first-in-first-out (FIFO) manner.

In some implementations, each portion of the input characterizationvector is associated with a different input modality-dialogue portion,dialogue delivery portion, body pose portion, eye tracking portion, orthe like. Similarly, in some implementations, each portion of thetemplate characterization vector is associated with a different inputmodality-dialogue portion, dialogue delivery portion, body pose portion,eye tracking portion, or the like. In some implementations, theperformance criteria associated with the respective time slice-charactertuple are satisfied based on a comparison between the one or moreportions of the input characterization vector for the respective timeslice-character tuple and one or more portions of the selected templatecharacterization vector. In some implementations, the performancecriteria are satisfied when the one or more portions of the inputcharacterization vector are within acceptability thresholds. Forexample, each portion (e.g., dialogue portion, dialogue deliveryportion, body pose portion, eye tracking portion, or the like) may beassociated with a different acceptability threshold. FIGS. 6A and 6B,described in more detail below, show an acceptability threshold 610 forbody pose portions 556 and 556A, respectively.

In some implementations, the CGR content manager 270 is configured toupdate the CGR environment presented via the CGR presentation pipeline450 based on a CGR content from the CGR content library 460 for a nexttime slice of the predetermined content when the input data (e.g., theaudio data 402A, the body pose data 402B, and the gaze tracking data402C) satisfies the performance criteria associated with the currenttime slice-character tuple. In some implementations, the CGR contentmanager 270 is configured to maintain the CGR environment presented viathe CGR presentation pipeline 450 based on the current time slice whenthe input data (e.g., the audio data 402A, the body pose data 402B, andthe gaze tracking data 402C) does not satisfy the performance criteriaassociated with the current time slice-character tuple and prepare for areattempt of the current time slice.

FIG. 5A illustrates an overview timeline 500 associated withpredetermined content in accordance with some implementations. Whilepertinent features are shown, those of ordinary skill in the art willappreciate from the present disclosure that various other features havenot been illustrated for the sake of brevity and so as not to obscuremore pertinent aspects of the example implementations disclosed herein.The overview timeline 500 illustrates thematic scenes 505A, 505B, and505C within predetermined content (e.g., a movie, TV episode, theatricalplay, historical event, or the like). As one example, the thematic scene505A spans from time 0:00:00 to time 0:30:00 (i.e., 30 minutes) withinthe predetermined content. Those of ordinary skill in the art willappreciate from the present disclosure that the overview timeline 500 isa non-limiting example and that the predetermined content may be dividedinto any number of thematic scenes of arbitrary length in variousimplementations.

FIG. 5B illustrates a sub-timeline 510 associated with the thematicscene 505A within the predetermined content in accordance with someimplementations. While pertinent features are shown, those of ordinaryskill in the art will appreciate from the present disclosure thatvarious other features have not been illustrated for the sake of brevityand so as not to obscure more pertinent aspects of the exampleimplementations disclosed herein. The sub-timeline 510 illustrates timeslices 515A, 515B, 515C, 515D, 515E, . . . (sometimes collectivelyreferred to herein as the “time slices 515”) within the thematic scene505A. For example, the time slices 515 are determined based on keyframesassociated with the thematic scene 505A. As another example, the timeslices 515 are determined based on dialogue associated with the thematicscene 505A. As yet another example, the time slices 515 are determinedbased on actions associated with the thematic scene 505A. yet anotherexample, the time slices 515 are determined based on a portion of thescreenplay or the like for the predetermined content that corresponds tothe thematic scene 505A. As one example, the time slice 515A spans fromtime 0:00:00 to time 0:00:08 (i.e., 8 seconds) within the predeterminedcontent. Those of ordinary skill in the art will appreciate from thepresent disclosure that the sub-timeline 510 is a non-limiting exampleand that the thematic scene 505A content may be divided into any numberof time slices of arbitrary length in various implementations.

FIG. 5C illustrates example template characterization vectors 520A and520B in accordance with some implementations. While pertinent featuresare shown, those of ordinary skill in the art will appreciate from thepresent disclosure that various other features have not been illustratedfor the sake of brevity and so as not to obscure more pertinent aspectsof the example implementations disclosed herein.

As shown in FIG. 5C, the first template characterization vector 520Aincludes: a content field corresponding to a content name of or uniqueidentifier for predetermined content 530, a scene field corresponding toa unique identifier for the thematic scene 505A within the predeterminedcontent 530, a time slice field corresponding to a unique identifier forthe time slice 515A within the predetermined content 530, and acharacter field corresponding to a name of or a unique identifier for acharacter 532A within the predetermined content 530. The first templatecharacterization vector 520A also includes a dialogue portion 542Acorresponding to dialogue or a set of lines associated with thecharacter 532A for the time slice 515A.

As shown in FIG. 5C, the first template characterization vector 520Afurther includes a dialogue delivery portion 544A associated with one ormore speech characteristics (e.g., intonation, cadence, accent, diction,articulation, pronunciation, and/or the like) associated with thedelivery of the dialogue portion 542A by the character 532A for the timeslice 515A. The first template characterization vector 520A furtherincludes a body pose portion 546A associated with one or more posecharacteristics associated with the character 532A for the time slice515A. For example, the one or more pose characteristics correspond to anoverall pose of the character 532A for the time slice 515A (e.g.,sitting, standing, crouching, etc.). As another example, the one or morepose characteristics correspond to rotational and/or translationalcoordinates for each joint, limb, and/or body portion of the character532A for the time slice 515A. As yet another example, the one or morepose characteristics correspond to rotational and/or translationalcoordinates for specific body parts (e.g., head, hands, and/or the like)of the character 532A for the time slice 515A. The first templatecharacterization vector 520A further includes a gaze direction portion548A associated with a directionality vector (e.g., X, Y, and/or focalpoint coordinates) for the gaze of the character 532A for the time slice515A.

As shown in FIG. 5C, the first template characterization vector 520Afurther includes one or more other portions 549A characterizing thecharacter 532A during the time slice 515A. Those of ordinary skill inthe art will appreciate from the present disclosure that the firsttemplate characterization vector 520A is a non-limiting example and thatthe first template characterization vector 520A may include othersub-divisions, identifiers, and/or portions in various implementations.

As shown in FIG. 5C, the second template characterization vector 520Bincludes: a content field corresponding to a content name of or uniqueidentifier for predetermined content 530, a scene field corresponding toa unique identifier for the thematic scene 505A within the predeterminedcontent 530, a time slice field corresponding to a unique identifier forthe time slice 515A within the predetermined content 530, and acharacter field corresponding to a name of or unique identifier for acharacter 532B within the predetermined content 530. The second templatecharacterization vector 520B also includes a dialogue portion 542Bcorresponding to dialogue or a set of lines associated with thecharacter 532B for the time slice 515A.

As shown in FIG. 5C, the second template characterization vector 520Bfurther includes a dialogue delivery portion 544B associated with one ormore speech characteristics (e.g., intonation, cadence, accent, diction,articulation, pronunciation, and/or the like) associated with thedelivery of the dialogue portion 542B by the character 532B for the timeslice 515A. The second template characterization vector 520B furtherincludes a body pose portion 546B associated with one or more posecharacteristics associated with the character 532B for the time slice515A. For example, the one or more pose characteristics correspond to anoverall pose of the character 532B for the time slice 515A (e.g.,sitting, standing, crouching, etc.). As another example, the one or morepose characteristics correspond to rotational and/or translationalcoordinates for each joint, limb, and/or body portion of the character532B for the time slice 515A. As yet another example, the one or morepose characteristics correspond to rotational and/or translationalcoordinates for specific body parts (e.g., head, hands, and/or the like)of the character 532B for the time slice 515A. The second templatecharacterization vector 520B further includes a gaze direction portion548B associated with a directionality vector (e.g., X, Y, and/or focalpoint coordinates) for the gaze of the character 532B for the time slice515A.

As shown in FIG. 5C, the second template characterization vector 520Bfurther includes one or more other portions 549B characterizing thecharacter 532B during the time slice 515A. Those of ordinary skill inthe art will appreciate from the present disclosure that the secondtemplate characterization vector 520B is a non-limiting example and thatthe second template characterization vector 520B may include othersub-divisions, identifiers, and/or portions in various implementations.As shown in FIG. 5C, the first template characterization vector 520A isassociated with a first time slice-character tuple (e.g., time slice515A-character 532A). The second template characterization vector 520Bis associated with a second time slice-character tuple (e.g., time slice515A-character 532B) different from the first time slice-character tupleassociated with the first template characterization vector 520A.

FIG. 5D illustrates an example input characterization vector 550 inaccordance with some implementations. While pertinent features areshown, those of ordinary skill in the art will appreciate from thepresent disclosure that various other features have not been illustratedfor the sake of brevity and so as not to obscure more pertinent aspectsof the example implementations disclosed herein.

As shown in FIG. 5D, the input characterization vector 550 includes: acontent field corresponding to a content name of or unique identifierfor the predetermined content 530, a scene field corresponding to aunique identifier for the thematic scene 505A within the predeterminedcontent 530, a time slice field corresponding to a unique identifier forthe time slice 515A within the predetermined content 530, and acharacter field corresponding to a name of or unique identifier for acharacter 532A within the predetermined content 530. The inputcharacterization vector 550 also includes a dialogue portion 552corresponding to speech-to-text output associated with audio datacollected from a user for the time slice 515A.

As shown in FIG. 5D, the input characterization vector 550 furtherincludes a dialogue delivery portion 554 associated with one or morespeech characteristics (e.g., intonation, cadence, accent, diction,articulation, pronunciation, and/or the like) associated with the audiodata collected from a user for the time slice 515A. The inputcharacterization vector 550 further includes a body pose portion 556associated with one or more pose characteristics associated with theuser for the time slice 515A. For example, the one or more posecharacteristics correspond to an overall pose of the user for the timeslice 515A (e.g., sitting, standing, crouching, etc.). As anotherexample, the one or more pose characteristics correspond to rotationaland/or translational coordinates for each joint, limb, and/or bodyportion of the user for the time slice 515A. As yet another example, theone or more pose characteristics correspond to rotational and/ortranslational coordinates for specific body parts (e.g., head, hands,and/or the like) of the user for the time slice 515A. The inputcharacterization vector 550 further includes a gaze direction portion558 associated with a directionality vector (e.g., X, Y, and/or focalpoint coordinates) for the gaze of the user for the time slice 515A.

As shown in FIG. 5D, the input characterization vector 550 furtherincludes one or more other portions 559 characterizing the user duringthe time slice 515A. Those of ordinary skill in the art will appreciatefrom the present disclosure that the input characterization vector 550is a non-limiting example and that the input characterization vector 550may include other sub-divisions, identifiers, and/or portions in variousimplementations. As shown in FIG. 5D, the input characterization vector550 associated with a respective time slice-character tuple (e.g., timeslice 515A-character 532A) that is also associated with the firsttemplate characterization vector 520A in FIG. 5C.

FIG. 5E illustrates abstract representations 574, 576, and 578associated with portions of the input characterization vector 550 inFIG. 5D in accordance with some implementations. While pertinentfeatures are shown, those of ordinary skill in the art will appreciatefrom the present disclosure that various other features have not beenillustrated for the sake of brevity and so as not to obscure morepertinent aspects of the example implementations disclosed herein. Asshown in FIG. 5E, the abstract representations 574, 576, and 578 areoverlaid on a sub-timeline 575 for the time slice 515A. For example, asmentioned above, the time slice 515A spans from time 0:00:00 to time0:00:08 (i.e., 8 seconds) within the predetermined content.

As shown in FIG. 5E, the abstract representation 574 corresponds to thedialogue delivery portion 554. For example, the abstract representation574 corresponds to a composite or aggregate signal for the one or morespeech characteristics included within the dialogue delivery portion 554(e.g., intonation, cadence, accent, diction, articulation,pronunciation, and/or the like). In various implementations, as will beunderstood by one of ordinary skill in the art, the abstractrepresentation 574 may be decomposed into a plurality of separatesignals for each of the one or more speech characteristics includedwithin the dialogue delivery portion 554.

As shown in FIG. 5E, the abstract representation 576 corresponds to thebody pose portion 556. For example, the abstract representation 576corresponds to a composite or aggregate signal for the one or more bodypose characteristics included within the body pose portion 556 (e.g.,overall pose; rotational and/or translational coordinates for eachjoint, limb, and/or body portion; rotational and/or translationalcoordinates for specific body parts; and/or the like). In variousimplementations, as will be understood by one of ordinary skill in theart, the abstract representation 576 may be decomposed into a pluralityof separate signals for each of the one or more body posecharacteristics included within the body pose portion 556.

As shown in FIG. 5E, the abstract representation 578 corresponds to thegaze direction portion 558. For example, the abstract representation 578corresponds to a composite or aggregate signal for the directionalityvector included within the gaze direction portion 558 (e.g., X, Y,and/or focal point coordinates). In various implementations, as will beunderstood by one of ordinary skill in the art, the abstractrepresentation 578 may be decomposed into a plurality of separatesignals for each component of the directionality vector included withinthe gaze direction portion 558.

FIG. 6A illustrates the abstract representation 576 of the body poseportion 556 of the input characterization vector 550 in FIG. 5D relativeto an acceptability threshold 610 in accordance with someimplementations. While pertinent features are shown, those of ordinaryskill in the art will appreciate from the present disclosure thatvarious other features have not been illustrated for the sake of brevityand so as not to obscure more pertinent aspects of the exampleimplementations disclosed herein.

As shown in FIG. 6A, the abstract representation 576 of the body poseportion 556 of the input characterization vector 550 is overlaid on thesub-timeline 575 for the time slice 515A. For example, as mentionedabove, the time slice 515A spans from time 0:00:00 to time 0:00:08(i.e., 8 seconds) within the predetermined content. As shown in FIG. 6A,the abstract representation 576 of the body pose portion 556 is withinan acceptability threshold 610 for the body pose portion. In variousimplementations, as will be understood by one of ordinary skill in theart, the acceptability threshold 610 may be tightened/narrowed orloosened/widened. In various implementations, as will be understood byone of ordinary skill in the art, the acceptability threshold 610 may besignal with a predefined tolerance width instead of a predefinedtolerance envelope as shown in FIG. 6A.

In some implementations, performance criteria associated with arespective time slice-character tuple (e.g., time slice 515A-character532A) are satisfied when the one or more portions of the inputcharacterization vector 550 for the respective time slice-charactertuple are within acceptability thresholds associated with the one ormore portions of the first template characterization vector 520A. Forexample, each portion (e.g., the dialogue portion 542A, the dialoguedelivery portion 544A, the body pose portion 546A, the eye trackingportion 548A, or the like) may be associated with a differentacceptability threshold.

In some implementations, the acceptability threshold changes based onprevious input data associated with a previous time slices satisfyingthe acceptability threshold. For example, if a threshold number ofprevious time slices satisfy the acceptability threshold, the controller110 and/or the electronic device 120 narrow (e.g., tighten or constrict)the acceptability threshold. If a threshold number of previous timeslices breach the acceptability threshold, the controller 110 and/or theelectronic device 120 widen (e.g., loosen or relax) the acceptabilitythreshold.

In some implementations, a degree of change in the acceptabilitythreshold is a function of a level of breach by a previous time slice.For example, if previous input data for previous time slices breachedthe acceptability threshold by 5-10%, then the controller 110 and/or theelectronic device 120 widen the acceptability threshold by 12%. However,if previous input data for previous time slices breached theacceptability threshold by 20-30%, then the controller 110 and/or theelectronic device 120 widen the acceptability threshold by 35%.

In some implementations, changing the acceptability threshold based onprevious user input for previous time slices enhances user experience bychanging the acceptability threshold based on previous time slices forusers with different skill levels. Narrowing the acceptability thresholdencourages the user to improve across time slices. Widening theacceptability threshold allows the user to satisfy the acceptabilitythreshold even if the user inputs are not a close match to templateinputs.

FIG. 6B illustrates another example abstract representation 576A of abody pose portion 556A of an input characterization vector relative toan acceptability threshold in accordance with some implementations.While pertinent features are shown, those of ordinary skill in the artwill appreciate from the present disclosure that various other featureshave not been illustrated for the sake of brevity and so as not toobscure more pertinent aspects of the example implementations disclosedherein.

As shown in FIG. 6B, the abstract representation 576A of the body poseportion 556A of an input characterization vector is overlaid on thesub-timeline 575 for the time slice 515A. For example, as mentionedabove, the time slice 515A spans from time 0:00:00 to time 0:00:08(i.e., 8 seconds) within the predetermined content. As shown in FIG. 6B,the abstract representation 576A of the body pose portion 556A breachesthe acceptability threshold 610 for the body pose portion in both the0:00:02 to 0:00:03 and 0:00:04 to 0:00:05 time windows.

FIG. 7 is a flowchart representation of a method 700 ofperformance-based progression of computer-generated reality (CGR)content in accordance with some implementations. In variousimplementations, the method 700 is performed by a device with one ormore processors and non-transitory memory (e.g., the controller 110 inFIGS. 1 and 2; the electronic device 120 in FIGS. 1 and 3; or a suitablecombination thereof) or a component thereof. In some implementations,the method 700 is performed by processing logic, including hardware,firmware, software, or a combination thereof. In some implementations,the method 700 is performed by a processor executing code stored in anon-transitory computer-readable medium (e.g., a memory).

For example, the electronic device 120 obtains audio data associatedwith an attempt by a user to deliver dialogue associated with a firstcharacter for a first time slice within predetermined content (e.g., aset of lines for a character associated with the first time slice of amovie or TV episode). Continuing with this example, the electronicdevice 120 also obtains body pose data and eye tracking data associatedwith an attempt of a user to mimic the acting, body language, facialexpressions, actions, and/or the like associated with the firstcharacter for the first time slice within the predetermined content(e.g., acting for the character associated with the first time slice ofthe movie or TV episode). In other words, the user attempts to mimic thecharacter within the predetermined content (e.g., dialogue and thedelivery thereof) and is scored on a time slice by time slice basis bythe device. If the user attempt to mimic a character for the first timeslice satisfies performance criteria, the device updates the CGRenvironment such that the user progresses to a next time slice of thepredetermined content. However, if the user attempt to mimic thecharacter for the first time slice does not satisfy the performancecriteria, the device maintains the CGR environment and allows the userto reattempt the first time slice of the predetermined content.

As represented by block 7-1, the method 700 includes obtaining firstinput data while presenting a CGR environment from the perspective of afirst character associated with a first time slice within predeterminedcontent. For example, the first time slice corresponds to a samplingperiod associated with a thematic scene within a theatrical play, movie,TV episode, historical event, fictional story, live event, etc. In someimplementations, the first input data is obtained from a user of thedevice via image sensors, microphones, and/or the like that are local toand/or remote from the device. As shown in FIG. 5D, the inputcharacterization vector 550 includes a dialogue portion 552 associatedwith audio data from a user, a dialogue delivery portion 554 associatedwith audio data from the user, a body pose portion 556 associated withbody pose data from the user, and a gaze direction portion 558associated with eye tacking data from the user.

In some implementations, the first input data correspond to a set oflines (and their intonation, cadence, etc.) associated with the firstcharacter for the first time slice within the predetermined content. Insome implementations, the first input data corresponds to a body pose(or action) associated with the first character for the first time slicewithin the predetermined content. In some implementations, the firstinput data corresponds to facial expressions associated with the firstcharacter for the first time slice within the predetermined content. Insome implementations, the first input data corresponds to a gazedirection associated with the first character for the first time slicewithin the predetermined content.

In some implementations, the first input data is obtained from at leastone of: a microphone, an inertial measurement unit (IMU), anaccelerometer, a gyroscope, an exterior-facing image sensor, a gazetracking device, or one or more physiological sensors. For example, theinput devices are located local to and/or remote from the device.

In some implementations, the first time slice corresponds to apredetermined sampling period for the predetermined content. Forexample, the first time slice corresponds to a predetermined globallength for the predetermined content such as X second time slices.

In some implementations, the first time slice corresponds to apredetermined sampling period for a current thematic scene within thepredetermined content. For example, the first time slice corresponds toa predetermined time slice for a current thematic scene within thepredetermined content such as a time slice of dialogue or a completemonologue.

As represented by block 7-2, the method 700 includes determining whetheror not the first input data satisfies first performance criteriaassociated with the first character for the first time slice within thepredetermined content. For example, the device determines whether thelines delivered by the user match a predetermined set of linesassociated with the first character for the first time slice within atolerance threshold based on speech processing. content. For example,the device determines whether the acting of the user matches apredetermined acting sequence associated with the first character forthe first time slice within a tolerance threshold based on body pose andgaze tracking.

In some implementations, the first input data includes audioinformation, and wherein the first performance criteria associated withthe first character for the first time slice within predeterminedcontent is satisfied when the audio information matches a set ofpredetermined dialogue that corresponds to the first character for thefirst time slice within the predetermined content. As shown in FIG. 5D,the input characterization vector 550 includes a dialogue portion 552.

In some implementations, the first input data includes audioinformation, and wherein the first performance criteria associated withthe first character for the first time slice within predeterminedcontent is satisfied when the audio information matches a set ofpredetermined dialogue that corresponds to the first character for thefirst time slice within the predetermined content and a set ofpredetermined characteristics for the set of predetermined dialoguecorresponds to the first character for the first time slice within thepredetermined content. For example, the set of predeterminedcharacteristics for the set of predetermined dialogue includes speechcharacteristics such as intonation, accent, cadence, pitch,pronunciation, and/or the like. As shown in FIG. 5D, the inputcharacterization vector 550 includes a dialogue delivery portion 554.

In some implementations, the first input data includes body poseinformation, and wherein the first performance criteria associated withthe first character for the first time slice within predeterminedcontent is satisfied when the body pose information matches apredetermined body pose that corresponds to the first character for thefirst time slice within the predetermined content. For example, thepredetermined body pose corresponds to a predetermined action performedby the first character for the first time slice within the predeterminedcontent such as walking in a particular path/direction, moving orpicking up items, moving limbs or pointing, etc. In another example, thepredetermined body pose corresponds to particular pose characteristicssuch as rotational and/or translational coordinates for select bodyparts or the like. As shown in FIG. 5D, the input characterizationvector 550 includes a body pose portion 556.

In some implementations, the first input data includes gaze information,and wherein the first performance criteria associated with the firstcharacter for the first time slice within predetermined content issatisfied when the gaze information matches a predetermined gazedirection that corresponds to the first character for the first timeslice within the predetermined content. As shown in FIG. 5D, the inputcharacterization vector 550 includes a gaze direction portion 558.

If the first input data satisfies the first performance criteriaassociated with the first character for the first time slice within thepredetermined content (“YES” branch from block 7-2), the method 700continues to block 7-3. If the first input data does not satisfy thefirst performance criteria associated with the first character for thefirst time slice within the predetermined content (“NO” branch fromblock 7-2), the method 700 continues to block 7-8.

As represented by block 7-3, the method 700 includes updating the CGRenvironment from the perspective of the first character associated witha second time slice in the predetermined content. In someimplementations, the predetermined content seamlessly advances to theset of lines or next scene within the predetermined content. In someimplementations, the user is able to select a different character forthe same time slice or thematic scene. In some implementations, the useris able to select a different time slice or thematic scene for the samecharacter. In some implementations, the user selects a differentcharacter and a different time slice or thematic scene.

In some implementations, the method 700 further includes determining ascore associated with the first input data and presenting the score thatcorresponds to the first input data. For example, the device presentsCGR content associated with a current aggregate score for the useracross multiple time slices and also a score for a most recent timeslice. For example, the score represented a delta between the user inputand the ground truth for the character within the predetermined content.

In some implementations, the method 700 further includes obtainingsecond input data while presenting the CGR environment from theperspective of the first character associated with a second time slicewithin predetermined content. In some implementations, the method 700further includes determining whether or not the second input datasatisfies second performance criteria associated with the firstcharacter for the second time slice within the predetermined content. Insome implementations, the first performance criteria associated with thefirst character for the first time slice within the predeterminedcontent are different from second performance criteria associated withthe first character for a second time slice within the predeterminedcontent. In some implementations, the first performance criteriaassociated with the first character for the first time slice within thepredetermined content are different from third performance criteriaassociated with a second character for the first time slice within thepredetermined content.

As represented by block 7-4, the method 700 includes maintaining the CGRenvironment from the perspective of the first character associated withthe first time slice in the predetermined content. In someimplementations, the predetermined content is reset to the beginning ofthe first time slice, for example, a thematic scene or dialogue portion(e.g., a set of lines).

FIGS. 8A and 8B illustrate a flowchart representation of a method 800 ofperformance-based progression of computer-generated reality (CGR)content in accordance with some implementations. In variousimplementations, the method 800 is performed by a device with one ormore processors and non-transitory memory (e.g., the controller 110 inFIGS. 1 and 2; the electronic device 120 in FIGS. 1 and 3; or a suitablecombination thereof) or a component thereof. In some implementations,the method 800 is performed by processing logic, including hardware,firmware, software, or a combination thereof. In some implementations,the method 800 is performed by a processor executing code stored in anon-transitory computer-readable medium (e.g., a memory).

As represented by block 8-1, the method 800 includes obtaining selectioninputs indicating a character and thematic scene within predeterminedcontent. For example, the device detects selection inputs from the user(e.g., a voice command, touch inputs relative to a selection menu, orthe like) indicating a thematic scene and a character for which the userwishes to attempt to mimic.

As one example, prior to obtaining the selection inputs, the devicepresents CGR content associated with the predetermined content, such asa movie, television episode, or other media content, via the device(e.g., a near-eye system, tablet, mobile phone, tablet, laptop or thelike). Continuing with this example, while presenting the CGR content,the device displays a prompt indicating that a performance-basedprogression mode is available for a current thematic scene (or futurethematic scene) within the predetermined content whereby the user isable to select a character or role within the CGR content to mimic (oract out) in order to progress CGR content.

As another example, prior to obtaining the selection inputs, a seconddevice (e.g., a television, tablet, laptop, or the like) separate fromthe device displays predetermined flat media content, such as a movie,television episode, or the like, to the user. Continuing with thisexample, the device displays a prompt indicating that aperformance-based progression mode is available for a current thematicscene (or future thematic scene) within the predetermined flat mediacontent whereby the user is able to select a character or role to mimic(or act out) in order to progress CGR content associated with thepredetermined flat media content.

As represented by block 8-2, the method 800 includes presenting a CGRenvironment that corresponds to the thematic scene from the character'spoint-of-view (POV). For example, the device presents CGR content to theuser such that the user “sees” through the eyes of the character whilethe CGR environment corresponds to the background associated with thethematic scene.

As represented by block 8-3, the method 800 includes obtaining inputdata for time slice X. With reference to FIG. 4, for example, the dataprocessing architecture 400 obtains audio data 402A, body pose data402B, and/or eye tracking data 402C associated with the user.

In some implementations, as represented by block 8-3 a, the input dataincludes audio data associated with dialogue. With reference to FIG. 4,for example, the NLP 252 performs STT processing on the audio data 402A.

In some implementations, as represented by block 8-3 b, the input dataincludes audio data associated with delivery of the dialogue. Withreference to FIG. 4, for example, the speech assessor 254 determines oneor more speech characteristics associated with the audio data 402A(e.g., intonation, cadence, accent, diction, articulation,pronunciation, and/or the like).

In some implementations, as represented by block 8-3 c, the input dataincludes body pose data. With reference to FIG. 4, for example, the bodypose interpreter 256 determines one or more pose characteristicsassociated with the body pose data 402B

In some implementations, as represented by block 8-3 d, the input dataincludes eye tracking. With reference to FIG. 4, for example, the gazedirection determiner 258 determines a directionality vector associatedwith the eye tracking data 402C.

As represented by block 8-4, the method 800 includes generating an inputcharacterization vector for time slice X based on the input data. Withreference to FIGS. 4 and 5D, for example, the input characterizationengine 250 generates an input characterization vector 550 that includesa dialogue portion 552 based on the output of the NLP 252, a dialoguedelivery portion 554 based on the output of the speech assessor 254, abody pose portion 556 based on the output of the body pose interpreter256, and a gaze direction portion 558 based on the output of the gazedirection determiner 258.

As represented by block 8-5, the method 800 includes obtaining atemplate characterization vector for time slice X. With reference toFIGS. 4 and 5C, for example, the template selector 264 selects the firsttemplate characterization vector 520A from the template library 266based on the time slice-character tuple (e.g., time slice 515A-character532A).

As represented by block 8-6, the method 800 includes determining whetheror not the input data satisfies performance criteria for time slice Xbased on a comparison between the input characterization vector for timeslice X and the template characterization vector for time slice X. Withreference to FIGS. 4, 5C, and 5D, the performance assessment engine 268determines whether or not the input data satisfies performance criteriafor the character 532A for the time slice 515A based on a comparisonbetween the input characterization vector 550 in FIG. 5D for time slice515A and the first template characterization vector 520A in FIG. 5C fortime slice 515A.

As represented by block 8-7, the method 800 includes progressing to timeslice X+1 in response to determining that the input data satisfies theperformance criteria for time slice X. In some implementations, asrepresented by block 8-7 a, the method 800 includes updating the CGRenvironment for time slice X+1. In some implementations, as representedby block 8-7 b, the method 800 includes adjusting the difficulty fortime slice X+1. For example, the device increases the difficulty fortime slice X+1 by modifying the performance criteria for time slice X+1such as narrowing one or more acceptability thresholds.

As represented by block 8-8, the method 800 optionally includesreattempting time slice X in response to determining that the input datadoes not satisfy the performance criteria for time slice X. In someimplementations, as represented by block 8-8 a, the method 800 includesmaintaining the CGR environment for time slice X. In someimplementations, as represented by block 8-8 b, the method 800 includesadjusting the difficulty for time slice X. For example, the devicedecreases the difficulty for the reattempt of time slice X by modifyingthe performance criteria for time slice X such as widening one or moreacceptability thresholds.

As represented by block 8-9, the method 800 optionally includespresenting supplementary content in response to determining that theinput data does not satisfy the performance criteria for time slice X.In some implementations, as represented by block 8-9 a, the method 800includes presenting a prompt. For example, the device presents CGRcontent including a prompt notification to reattempt the time slice X.In some implementations, as represented by block 8-9 b, the method 800includes presenting a tutorial associated with time slice X. Forexample, the device presents CGR content including a tutorial ondelivering the dialogue associated with the character for the time sliceX (e.g., a coaching session). In some implementations, as represented byblock 8-9 c, the method 800 includes presenting ground truth contentassociated with time slice X. For example, the device presents CGRcontent including a third-person view of the portion of thepredetermined content associated with time slice X so that the user isable to see the ground truth content for time slice X.

While various aspects of implementations within the scope of theappended claims are described above, it should be apparent that thevarious features of implementations described above may be embodied in awide variety of forms and that any specific structure and/or functiondescribed above is merely illustrative. Based on the present disclosureone skilled in the art should appreciate that an aspect described hereinmay be implemented independently of any other aspects and that two ormore of these aspects may be combined in various ways. For example, anapparatus may be implemented and/or a method may be practiced using anynumber of the aspects set forth herein. In addition, such an apparatusmay be implemented and/or such a method may be practiced using otherstructure and/or functionality in addition to or other than one or moreof the aspects set forth herein.

It will also be understood that, although the terms “first,” “second,”etc. may be used herein to describe various elements, these elementsshould not be limited by these terms. These terms are only used todistinguish one element from another. For example, a first node could betermed a second node, and, similarly, a second node could be termed afirst node, which changing the meaning of the description, so long asall occurrences of the “first node” are renamed consistently and alloccurrences of the “second node” are renamed consistently. The firstnode and the second node are both nodes, but they are not the same node.

The terminology used herein is for the purpose of describing particularimplementations only and is not intended to be limiting of the claims.As used in the description of the implementations and the appendedclaims, the singular forms “a,” “an,” and “the” are intended to includethe plural forms as well, unless the context clearly indicatesotherwise. It will also be understood that the term “and/or” as usedherein refers to and encompasses any and all possible combinations ofone or more of the associated listed items. It will be furtherunderstood that the terms “comprises” and/or “comprising,” when used inthis specification, specify the presence of stated features, integers,steps, operations, elements, and/or components, but do not preclude thepresence or addition of one or more other features, integers, steps,operations, elements, components, and/or groups thereof.

As used herein, the term “if” may be construed to mean “when” or “upon”or “in response to determining” or “in accordance with a determination”or “in response to detecting,” that a stated condition precedent istrue, depending on the context. Similarly, the phrase “if it isdetermined [that a stated condition precedent is true]” or “if [a statedcondition precedent is true]” or “when [a stated condition precedent istrue]” may be construed to mean “upon determining” or “in response todetermining” or “in accordance with a determination” or “upon detecting”or “in response to detecting” that the stated condition precedent istrue, depending on the context.

What is claimed is:
 1. A method comprising: at a device includingnon-transitory memory and one or more processors coupled with thenon-transitory memory: while presenting a computer-generated reality(CGR) environment from the perspective of a first character associatedwith a first time slice within predetermined content including atimeline with a sequence of time slices, wherein the sequence of timeslices includes the first time slice and a second time slice, obtainingfirst input data; determining whether or not the first input datasatisfies first performance criteria associated with the first characterfor the first time slice; and in response to determining that the firstinput data satisfies the first performance criteria associated with thefirst character for the first time slice, updating the CGR environmentfrom the perspective of the first character associated with the secondtime slice.
 2. The method of claim 1 further comprising, in response todetermining that the first input data does not satisfy the firstperformance criteria associated with the first character for the firsttime slice, maintaining the CGR environment from the perspective of thefirst character associated with the first time slice.
 3. The method ofclaim 2 further comprising, in response to determining that the firstinput data does not satisfy the first performance criteria associatedwith the first character for the first time slice, presenting a promptnotification.
 4. The method of claim 2 further comprising, in responseto determining that the first input data does not satisfy the firstperformance criteria associated with the first character for the firsttime slice, presenting a tutorial associated with the first characterfor the first time slice.
 5. The method of claim 2 further comprising,in response to determining that the first input data does not satisfythe first performance criteria associated with the first character forthe first time slice, presenting a third-person view of thepredetermined content for the first time slice.
 6. The method of claim1, wherein the first input data is obtained from at least one of: amicrophone, an inertial measurement unit (IMU), an accelerometer, agyroscope, an exterior-facing image sensor, a gaze tracking device, orone or more physiological sensors.
 7. The method of claim 1, wherein thefirst input data includes audio information, and wherein the firstperformance criteria associated with the first character for the firsttime slice is satisfied when the audio information matches a set ofpredetermined dialogue that corresponds to the first character for thefirst time slice.
 8. The method of claim 1, wherein the first input dataincludes body pose information, and wherein the first performancecriteria associated with the first character for the first time slice issatisfied when the body pose information matches predetermined body poseinformation that corresponds to the first character for the first timeslice.
 9. The method of claim 1, wherein the first input data includesfacial information, and wherein the first performance criteriaassociated with the first character for the first time slice issatisfied when the facial information matches predetermined facialexpression information that corresponds to the first character for thefirst time slice.
 10. The method of claim 1, wherein the first inputdata includes gaze information, and wherein the first performancecriteria associated with the first character for the first time slice issatisfied when the gaze information matches predetermined gaze directioninformation that corresponds to the first character for the first timeslice.
 11. The method of claim 1, further comprising: while presentingthe CGR environment from the perspective of the first characterassociated with the second time slice, obtaining second input data; anddetermining whether or not the second input data satisfies secondperformance criteria associated with the first character for the secondtime slice.
 12. The method of claim 1, further comprising: determining ascore associated with the first input data; and presenting the scorethat corresponds to the first input data.
 13. The method of claim 1,wherein the first time slice corresponds to a predetermined samplingperiod for the predetermined content.
 14. The method of claim 1, whereinthe first time slice corresponds to a predetermined sampling period fora current thematic scene within the predetermined content.
 15. A devicecomprising: one or more processors; a non-transitory memory; and one ormore programs stored in the non-transitory memory, which, when executedby the one or more processors, cause the device to: while presenting acomputer-generated reality (CGR) environment from the perspective of afirst character associated with a first time slice within predeterminedcontent including a timeline with a sequence of time slices, wherein thesequence of time slices includes the first time slice and a second timeslice, obtain first input data; determine whether or not the first inputdata satisfies first performance criteria associated with the firstcharacter for the first time slice; and in response to determining thatthe first input data satisfies the first performance criteria associatedwith the first character for the first time slice, update the CGRenvironment from the perspective of the first character associated withthe second time slice.
 16. The device of claim 15, wherein the one ormore programs further cause the device to: in response to determiningthat the first input data does not satisfy the first performancecriteria associated with the first character for the first time slice,maintain the CGR environment from the perspective of the first characterassociated with the first time slice.
 17. The device of claim 15,wherein the first input data includes audio information, and wherein thefirst performance criteria associated with the first character for thefirst time slice is satisfied when the audio information matches a setof predetermined dialogue that corresponds to the first character forthe first time slice.
 18. The device of claim 15, wherein the firstinput data includes body pose information, and wherein the firstperformance criteria associated with the first character for the firsttime slice is satisfied when the body pose information matchespredetermined body pose information that corresponds to the firstcharacter for the first time slice.
 19. The device of claim 15, whereinthe first input data includes facial information, and wherein the firstperformance criteria associated with the first character for the firsttime slice is satisfied when the facial information matchespredetermined facial expression information that corresponds to thefirst character for the first time slice.
 20. The device of claim 15,wherein the first input data includes gaze information, and wherein thefirst performance criteria associated with the first character for thefirst time slice is satisfied when the gaze information matchespredetermined gaze direction information that corresponds to the firstcharacter for the first time slice.
 21. A non-transitory memory storingone or more programs, which, when executed by one or more processors ofa device, cause the device to: while presenting a computer-generatedreality (CGR) environment from the perspective of a first characterassociated with a first time slice within predetermined contentincluding a timeline with a sequence of time slices, wherein thesequence of time slices includes the first time slice and a second timeslice, obtain first input data; determine whether or not the first inputdata satisfies first performance criteria associated with the firstcharacter for the first time slice; and in response to determining thatthe first input data satisfies the first performance criteria associatedwith the first character for the first time slice, update the CGRenvironment from the perspective of the first character associated withthe second time slice.
 22. The non-transitory memory of claim 21,wherein the one or more programs further cause the device to: inresponse to determining that the first input data does not satisfy thefirst performance criteria associated with the first character for thefirst time slice, maintain the CGR environment from the perspective ofthe first character associated with the first time slice.
 23. Thenon-transitory memory of claim 21, wherein the first input data includesaudio information, and wherein the first performance criteria associatedwith the first character for the first time slice is satisfied when theaudio information matches a set of predetermined dialogue thatcorresponds to the first character for the first time slice.
 24. Thenon-transitory memory of claim 21, wherein the first input data includesbody pose information, and wherein the first performance criteriaassociated with the first character for the first time slice issatisfied when the body pose information matches predetermined body poseinformation that corresponds to the first character for the first timeslice.
 25. The non-transitory memory of claim 21, wherein the firstinput data includes facial information, and wherein the firstperformance criteria associated with the first character for the firsttime slice is satisfied when the facial information matchespredetermined facial expression information that corresponds to thefirst character for the first time slice.
 26. The non-transitory memoryof claim 21, wherein the first input data includes gaze information, andwherein the first performance criteria associated with the firstcharacter for the first time slice is satisfied when the gazeinformation matches predetermined gaze direction information thatcorresponds to the first character for the first time slice.